MARS: Multimacro Architecture SRAM CIM-Based Accelerator With Co-Designed Compressed Neural Networks
نویسندگان
چکیده
Convolutional neural networks (CNNs) play a key role in deep learning applications. However, the large storage overheads and substantial computation cost of CNNs are problematic hardware accelerators. Computing-in-memory (CIM) architecture has demonstrated great potential to effectively compute large-scale matrix-vector multiplication. intensive multiply accumulation (MAC) operations executed at crossbar array limited capacity CIM macros remain bottlenecks for further improvement energy efficiency throughput. To reduce costs, network pruning quantization two widely studied compression methods shrink model size. most algorithms can only be implemented digital-based CNN For implementation static random access memory (SRAM) CIM-based accelerator, algorithm must consider limitations macros, such as number word lines bit that turned on same time, well how map weight SRAM macro. In this study, software co-design approach is proposed design an accelerator CIM-aware algorithm. lessen high-precision MAC required by batch normalization (BN), fuse BN into weights proposed. Furthermore, parameters, sparsity considers Last, MARS, utilize multiple processing units support network,
منابع مشابه
An accelerator for neural networks with pulse-coded model neurons
The labeling of features by synchronization of spikes seems to be a very efficient encoding scheme for a visual system. Simulation of a vision system with millions of pulse-coded model neurons, however, is almost impossible on the base of available processors including parallel processors and neurocomputers. A "one-to-one" silicon implementation of pulse-coded model neurons suffers from communi...
متن کاملComputational Throughput of Accelerator Units with Application to Neural Networks
The size of data that can be fitted with a statistical model becomes restrictive when accounting for hidden dynamical effects, but approximations can be computed using loosely coupled computations mainly limited by computational throughput. This whitepaper describes scalability results attained by implementing one approximate approach using accelerator technology identified in the PRACE deliver...
متن کاملStitch-X: An Accelerator Architecture for Exploiting Unstructured Sparsity in Deep Neural Networks
Sparse deep neural network (DNN) accelerators exploit the intrinsic redundancy in data representation to achieve high performance and energy efficiency. However, sparse weight and input activation arrays are unstructured, and their processing cannot take advantage of the regular data-access patterns offered by dense arrays, thus the processing incurs increased complexities in dataflow orchestra...
متن کاملExtending the Model Driven Architecture with a pre- CIM level
Whilst the successful alignment of business strategy and IT development is an important topic, there are still few ways that this is possible. The Model Driven Architecture (MDA) shows promise as an approach but is focussed firmly in the IT domain at the level of the Platform Independent and Platform Specific Models. The Computation Independent Model (CIM) is targetted at business users, but th...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
ژورنال
عنوان ژورنال: IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems
سال: 2022
ISSN: ['1937-4151', '0278-0070']
DOI: https://doi.org/10.1109/tcad.2021.3082107